SMART-INT: A SYSTEM FOR ANSWERING QUERIES OVER WEB DATABASES USING ATTRIBUTE DEPENDENCIES by

نویسندگان

  • Anupam Khulbe
  • Jieping Ye
  • Subbarao Kambhampati
چکیده

Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, it is required to integrate the information about the individual entities that are fragmented over multiple tables. At first blush this is just the inverse of a traditional database normalization problem the universal relation is to be reconstructed from the given tables (sources). However, the tables maybe missing Primary Key Foreign Key relations, which leads to technical challenges. It is clear that reconstruction and retrieval of relevant entities will have to involve joining the tables. While tables do share attributes, direct joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. This work is aimed at addressing the problem of data integration in such scenarios. To make-up for the missing primary key-foreign key relations, this approach mines and uses attribute dependencies. Such dependencies can both be intra-table and inter-table. Given a query, these dependencies are used to piece together a tree of relevant tables and schemes for joining them. For experimental evaluation, a master table was fragmented horizontally and vertically to generate a set of overlapping tables. A set of randomly generated queries were targeted at both the master tables and the fragmented tables. The result from the master tables acted as the ground truth, against which the result from this approach was evaluated, in terms of precision and recall. Approaches like answering from a single table, and direct joins were employed as benchmarks. The result tuples produced by this approach are demonstrated to be able to strike a favorable balance between precision and recall.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the finite controllability of conjunctive query answering in databases under open-world assumption

In this paper we study queries over relational databases with integrity constraints (ICs). The main problem we analyze is OWA query answering, i.e., query answering over a database with ICs under open-world assumption. The kinds of ICs that we consider are inclusion dependencies and functional dependencies, in particular key dependencies; the query languages we consider are conjunctive queries ...

متن کامل

Ontograte: towards Automatic Integration for Relational Databases and the Semantic Web through an Ontology-Based Framework

Integrating existing relational databases with ontology-based systems is among the important research problems for the Semantic Web. We have designed a comprehensive framework called OntoGrate which combines a highly automatic mapping system, a logic inference engine, and several syntax wrappers that inter-operate with consistent semantics to answer ontology-based queries using the data from he...

متن کامل

Supporting Queries with Imprecise Constraints

In this paper, we motivate the need for and challenges involved in supporting imprecise queries over Web databases. Then we briefly explain our solution, AIMQ a domain independent approach for answering imprecise queries that automatically learns query relaxation order by using approximate functional dependencies. We also describe our approach for learning similarity between values of categoric...

متن کامل

IQPI: An Incremental System for Answering Imprecise Queries Using Approximate Dependencies and Concept Similarities

Most of the proposed systems to process queries over web databases require the user to provide some information regarding the relative importance of attributes and the similarities between nominal values. Recently, a new system called AIMQ has been proposed, which is based on measuring concept similarities. This system is end-user independent and can answer imprecise queries. The main drawback ...

متن کامل

Anwering Recursive Queries under Keys and Foreign Keys is Undecidable

Query answering in the presence of integrity constraints is a fundamental problem in several settings, such as information integration. Keys, foreign keys and inclusion dependencies are the most common forms of constraints used in databases. It has been established recently that, in the presence of such constraints, query answering is decidable for non-recursive queries. Obviously, in the absen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009